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GOODNESS-OF-FIT PROBLEM FOR ERRORS 
IN NONPARAMETRIC REGRESSION: 
DISTRIBUTION FREE APPROACH 

By Estate V. Khmaladze and Hira L. Koul 1 

Victoria University of Wellington and Michigan State University 

This paper discusses asymptotically distribution free tests for the 
classical goodness-of-fit hypothesis of an error distribution in non- 
parametric regression models. These tests are based on the same mar- 
tingale transform of the residual empirical process as used in the one 
sample location model. This transformation eliminates extra random- 
ization due to covariates but not due the errors, which is intrinsically 
present in the estimators of the regression function. Thus, tests based 
on the transformed process have, generally, better power. The results 
of this paper are applicable as soon as asymptotic uniform linearity 
of nonparametric residual empirical process is available. In particular 
they are applicable under the conditions stipulated in recent papers 
of Akritas and Van Keilegom and Muller, Schick and Wefelmeyer. 

1. Introduction. Consider a sequence of i.i.d. pairs of random variables 
{{Xi,Yi)f =1 } where Xi are d-dimensional covariates and Y{ are the one- 
dimensional responses. Suppose Yi has regression in mean on Xi, that is, 
there is a regression function m(-) and a sequence of i.i.d. zero mean inno- 
vations {ei, 1 < i < n}, independent of {Xi}, such that 

Yi = m(Xi) + a, i = l,...,n. 

This regression function, as in most applications, is generally unknown and 
we do not make assumptions about its possible parametric form, so that we 
need to use a nonparametric estimator m n (-) based on {(Xi, Yj)™ =1 }. 

The problem of interest here is to test the hypothesis that the common 
distribution function (d.f.) of is a given F. Since m(-) is unknown we can 
only use residuals 

ii =Yi -m n (Xi), i = l,...,n, 
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which, obviously, are not i.i.d. anymore. Let F n and F n denote the empirical 
d.f. of the errors ej, 1 <i<n, and the residuals ej, 1 < i < n, respectively, 
and let 

v n (x) = V^[F n (x)-F(x)], v n (x) = ^[F n (x)-F(x)}, x E R, 

denote empirical and "estimated" empirical processes. 

Akritas and Van Keilegom (2001) and Miiller, Schick and Wefelmayer 
(2007) established, under the null hypothesis and some assumptions and 
when d = 1 , the following uniform asymptotic expansion of v n : 

(1.1) v n (x) =v n (x) - f(x)R n + £ n (x), sup |£ n (x) | = o p (1), 
where 

(1.2) R n = O p (l). 
Basically, the term R n is made up by the sum 

n 

Rn = n-^Y^rhniXi) - m(Xi)], 

i=l 

but using special form of the estimator rh n , Miiller, Schick and Wefelmeyer 
obtained especially simple form for it: 

n 

(1.3) R n = n- 1 ' 2 Y j e t . 

i=l 

Miiller, Schick and Wefelmeyer (2009) provides a set of sufficient conditions 
under which (1.1)-(1.3) continue to hold for the case d > 1. 

In the case of parametric regression where the regression function is of the 
parametric form, m(-) = m(-,9), and the unknown parameter 9 is replaced 
by its estimator 9 n , similar asymptotic expansion have been established in 
Loynes (1980), Koul (2002) and Khmaladze and Koul (2004). However, the 
nonparametric case is more complex and it is remarkable that the asymptotic 
expansions (1.1) and (1.2) are still true. 

The above expansion leads to the central limit theorem for the process 
v n , and, hence, produces the null limit distribution for test statistics based 
on this process. However, the same expansion makes it clear that the sta- 
tistical inference based on v n is inconvenient in practice and even infeasible; 
not only does the limit distribution of v n after time transformation t = F{x) 
still depend on the hypothetical d.f. F, but it depends also on the estimator 
rh n (and, in general, on the regression function m itself), that is, it is dif- 
ferent for different estimators. Since goodness-of-fit statistics are essentially 
nonlinear functionals of the underlying process with difficult to calculate 
limit distributions, it is practically inconvenient to be obliged to do sub- 
stantial computational work to evaluate their null distributions every time 
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we test the hypothesis. Note, in particular, that if we try to use some kind 
of bootstrap simulations, we would have to compute the nonparametric es- 
timator rh n for every simulated subsample, which makes it especially time 
consuming. 

Starting with asymptotic expansion (1.1) of Akritas and Van Keilegom 
and Miiller, Schick and Wefelmeyer, our goal is to show that the above- 
mentioned complications can be avoided in the way, which is technically 
surprisingly simple. Namely, we present the transformed process w n , which, 
after time transformation t = F(x), converges in distribution to a standard 
Brownian motion, for any estimator rh n for which (1.1) is valid. One would 
expect that this is done at the cost of some power. We shall see, however, 
somewhat unexpectedly, that tests based on this transformed process w n 
should, typically, have better power than those based on v n . Perhaps it 
is worth emphasizing that to achieve this goal we actually need only the 
smallness of the remainder process £ n and not asymptotic boundedness (1.2) 
in the expansion (1.1). 

We end this section by mentioning some recent applications of martingale 
transform, in different types of regression problems, by Koenker and Xie 
(2002, 2006), Bai (2003), Delgado, Hidalgo and Velasco (2005) and Koul 
and Yi (2006). 

2. Transformed process. Suppose the d.f. F has an absolutely continu- 
ous density / with a.e. derivative / and finite Fisher information for location. 
Let tjjf = —f/f denote the score function for location family F(- — #),#£ IR 
at 9 = — we can assume that 8 = without loss of generality. Then, 



(2.1) / ipj(x)dF(x) < oo. 

Consider augmented score function 

Mx)= Ul)) 

and augmented incomplete information matrix 

T F{x) = J™h(x)h T (x)dF(x) = ( 1 -^ ) x) fj^y xeM, 

with a}(x)=f™tf(y)dF(y). 

For a signed measure v for which the following integral is well defined, let 



/X /* OO 

oo hT (y) T Fly)J y h(z)du(z)dF(y), 



X 6 



Occasionally, v will be a vector of signed measures in which case K will be 
a vector also. 
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Our transformed process w n is defined as 

(2.2) w n (x)=V^[F n (x)-K(x,F n )}, x e R. 

We shall show that w n converges in distribution to the Brownian motion w 
in time F, that is, w n (F~ 1 ) converges weakly to standard Brownian motion 
on the interval [0, 1], where F~ 1 (u) = mf{x; F(x) >u}, 0< u < 1. 
To begin with observe that the process w n can be rewritten as 

(2.3) w n (x) =v n (x) -K(x,v n ). 

Indeed, F(x) is the first coordinate of the vector-function H(x) = hdF 
= (F(x), —f(x)) T , and we will see that 

(2.4) H T (x)-K{x,H T ) = VxGR. 

Subtracting this identity from (2.2) yields (2.3). Using asymptotic expansion 
(1.1) we can rewrite 

(2.5) w n (x) =v n (x) - K(x,v n ) + r] n (x), r] n {x) = i n (x) - K{x,i n ), 

where one expects % to be "small" (see Section 4), and the main part on 
the right not to contain the term / \F~ l {t))R n of that expansion. This is 
true again because of (2.4) and because the second coordinate of H(x) is 

-/(*)■ 

The transformation w n is very similar to the one studied in Khmaladze 
and Koul (2004) where regression function is assumed to be parametric. 
However, asymptotic behavior of the empirical distribution function F n here 
is more complicated. As a result, we have to prove the smallness of the "resid- 
ual process" r] n in (2.5) differently (see Section 4). Here we demonstrate 
that although, in this transformation, singularity at t = 1 exists, the process 
w n (F~ 1 ) converges to its weak limit on the closed interval [0,1] — see The- 
orem 4.1(h). Besides, we explicitly consider the case of possibly degenerate 
matrix ^f(x) an d show that w n is still well defined — see Lemma 2.1. 

If Ff(x) i s °f the full rank for all i£l, then (2.4) is obvious. For most 
d.f.'s F, the matrix ^f(x) indeed is not degenerate, that is, the coordinates 
1 and ipf of h are linearly independent functions on tail set {x > xq} for 
every xq S R. However, if (and only if) for x greater than some xq, the 
density / has the form f(x) = ae~ ax ,a > 0, the function ipf(x) equals the 
constant a so that 1 and ipf(x) become linearly dependent for x > Xq. As 
this can indeed be the case in applications, for example, for the double 
exponential distribution, it is useful to show that (2.4) is still correct and 
the transformation (2.3) still can be used. 

The lemma below shows, that although in this case ^pi x ) cann °t be 
uniquely defined, the function /i T (x)r^,^ f£° h(y) dfi(y) with fi = v n or /i = 
v n , is well defined. Here it is more transparent and simple to use also 
time transformation t = F(x). Accordingly, let u n {t) = v n (F~ 1 (t)), u n (t) = 
VniF-^t)), 1 (t) = h(F~ 1 (t)), and T t = // 7 (sh(*) T ds, < t < 1. 
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Lemma 2.1. Suppose, for some Xq, such that < F(xq) < 1, the matrix 
Tf(x)> f or x > xq degenerates to the form 

(2.6) ^f(x) = (1 ~~ F( x )) ( ^ \/x > xq, some a > 0. 

Then, the equalities (2.4) and, hence, (2.3) are still valid. Besides, 
h T {x)T-\ x) J™ h(y) dv n {y) = - 1 V ^ VxGi, 



or 

U n (t) 



1 T (t)T^j\(s)du n ( S )=-^ 



V0<t < 1. 



t 

A similar fact holds with v n (u n ) replaced by v n (u n ). 

Remark 2.1. The argument that follows is an adaptation and simplifi- 
cation of a general treatment of the case of degenerate matrices T F r x \, given 
in Nikabadze (1987) and Tsigroshvili (1998). 

Proof of Lemma 2.1. Let j(t) = (l,a) T ,t = F(x). The image and 
kernel of the linear operator in R 2 of Ft, respectively, are 

T(T t ) = {b:b = T t a for some a G M 2 } 

= {b:b = (3(l-t)(l,a) T ,PeR}; 

fC(Tt) = {a : T t a = 0} = {a : a = c(-a, 1) T , c G E}. 

Moreover, both L "fdu n and i7(i ? ~ 1 (i)) are in TiTt) arid if b G ^(r^) then 
T t b = (l-t)(l + a 2 )b. Then L t _1 is any (matrix of) linear operator on T(Tt) 
such that 

rrl6= (l-t)(l + a») 6 + °' ae!C[Tt) - 
But 7(f) = (l,a) T is orthogonal to an a G /C(r t ) and therefore 

does not depend on the choice of a G K,(T t ) and, hence, is defined uniquely. 
For b = Jj 1 7(s) du n (s) this gives the equality in the lemma. Besides, for any 

bei(r t ), oG/C(r t ), 

7 T (f)r t " 1 L t (6 + a) = 7 T (i)rr 1 r t 6 = l T (t)b = 7 T (t)(b + a), 
which gives (2.4). The rest of the claim is obvious. □ 
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Now consider the leading term of (2.5) in time t = F(x). It is useful to 
consider its function parametric version, denned as 

(2.8) b n (<p) = u n (ip)-K n (ip), ip€L 2 [0,l], 
where u n ((p) = Jq 1 tp{s) du n (s), and 

K n (<p) = K{<p,u n )= f 1 ^(t) 7 T (t)r t - 1 f\{s)du n {s)dt. 
Jo Jt 

With slight abuse of notation, denote b n {<p) when ip{-) = I{- < t) by 

(2.9) b n (t) = ^(t) - f 7 T ( , u)r~ 1 f 1 7 (s) dun(s) du. 

JO Ju 

Conditions for weak convergence of u n are well known: if $ C ^[0, 1] is 
a class of functions, such that the sequence u n (ip),n > 1, is uniformly in n 
equicontinuous on $, then u n — >^ u in i 00 ($) where u is standard Brown- 
ian bridge, see, for example, van der Vaart and Wellner (1996). The con- 
ditions for the weak convergence of K n to great extent must be simpler, 
because, unlike u n , K n is continuous linear functional in (p on the whole 
of -^2(0, 1], however, not uniformly in n. We will see, Proposition 2.1 below, 
that although, for every e > 0, the provisional limit in distribution of K n (<p), 
namely, 

%)=%«)= f ( p(t)j r (t)Tt 1 f\(s)du(s)dt 
Jo Jt 

is continuous on L2 )£ , the class of functions in L,2[0, 1] which are equal 
on the interval (1 — e, 1], it is not continuous on Z,2[0, 1]. Therefore it is 
unavoidable to use some condition on cp at t = 1. Condition (2.10) below 
still allows ip{t) — > 00 as t — > 1 (see examples below). 

Theorem 2.1. (i) Let L2 i£ C L2[0, 1] be the sub space of all square in- 
tegrable functions which are equal to on the interval (1 — e, 1]. Then, 
K n — >d K, on L2, £ . for any < e < 1 . 

(ii) Let, for an arbitrary small but fixed e > 0, C < 00, and a < 1/2, $ £ C 
£2(0, 1] be a class of all square integrable functions satisfying the following 
right tail condition: 

(2.10) \<p{s)\ < C[7 T ( S )r s " 1 7 ( S )]- 1 /2 (1 _ s) -l/2-a y s>1 _ £ . 
Then, K n — >d K, on <I> £ . 

Proof, (i) The integral ^du n as process in t, obviously, converges in 
distribution to the Gaussian process 7 du. Therefore, all finite-dimensional 
distributions of 7 T (t)r i T 1 f^ 7 du n , for t < 1, converge to corresponding finite- 
dimensional distributions of the Gaussian process "f T (tyT^ 1 f^ 7 du. Hence, 
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for any fixed <p £ L^^, distribution of K n ((p) converges to that of K{ip). So, 
we only need to show tightness, or, equivalently, equicontinuity of K n {ip) in 
if. We have 



\K n {v)\ < / \<p(t)\ 



7 T (t)Tt l J\(s)du n { t 



dt 



< sup 

t<l-e 



7 T (t)T^ / 7 (s)du n (s) 



l-e 



\<p(t)\dt, 



while 



sup 

t<l-e 



7 T (t)r t - 1 f\{s)du n {s) sup 7 T Wrr 1 (\{s)du{s) 

Jt t<X-e Jt 



Op(l). 



This proves that K n (ip) is equicontinuous in (/? G Z/2. e and (i) follows. 

(ii) To prove (ii), what we need is to show the equicontinuity of K n ((p) 
on $ e . But for this we need only to show that for a sufficiently small e > 0, 
and uniformly in n, 



sup 



l-e 



^(t) 7 T (t)r t - 1 / 1 {s)du n ( S )dt 



is arbitrarily small in probability. Denote the envelope function for ip £ <E> £ 
by ^. Then, the above expression is bounded above by 



l-e 



m)\ 



7 T (t)rr 1 / 7 (*) <M*) 



r/f. 



However, bearing in mind that 



E 



7 T (t)r^ J\( s )du n ( s ) 2 < 7 T (t)rr 1 7(t) ViG [0,1], 



we obtain that 
fi 



E 



l-e 



< 



7 T (t)r i " 1 j\{s)du n (s) 



dt 



i 

1 

l-e 



\^{t)\E 



7 T (t)lT 1 j\(s)du n (s) 



dt 



|*(t)||7 T (*)rr 1 7(*)l 1/a <ft< 



!_ 6 (l-t)V2+« 



(it. 



The last integral can be made arbitrarily small for sufficiently small e. □ 

Consequently, we obtain the following limit theorem for b n . Recall, say 
from van der Vaart and Wellner (1996), that the family of Gaussian random 
variables b((p),(p 6 -^[0, 1] with covariance function Eb(ip)b(ip') = £ <p{t)<p'{t) dt 
is called (function parametric) standard Brownian motion on $ if b((p) is 
continuous on <£. 
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Theorem 2.2. (i) Let & be a Donsker class, that is, let u n — >d u in 
loo{^)- Then, for every e > 0, 

where {b((p), is standard Brownian motion. 

(ii) If the envelope function of (2.10) tends to positive (finite or 

infinite) limit at t = 1, then for the process (2.9) we have 

K b on [0, 1]. 

Examples. Here, we discuss some examples analyzing the behavior of 
the upper bound of (2.10) in the right tail. In all these examples we will see 
that not only the class of indicator functions satisfy (2.10) but also a class of 
unbounded functions ip with <p(s) = 0((1 — s)~ a ),a < 1/2, as s— > 1, satisfy 
this condition. 

Consider logistic d.f. F with the scale parameter 1, or equivalently ipf(x) 
= 2F(x) - 1. Then h(x) = (1, 2F(x) - 1) T or j(s) = (1, 2s - 1) T and 

r - = P-)(! (i-n.'^). «cr.) = ^, 

2\ 



r -i = _3 f(l-2s + 4s 2 )/3 -s\ 



so that indeed 7 T \s)Tj 1 -f(s) = 4(1 - s)~ l , for all < s < 1. 

Next, suppose F is standard normal d.f. Because here ipf(x) = x, one 

F(x)). Then, 



obtains h(x) = (1, x) T and cr 2 f(x) = xf(x) + 1 — F(x). Let fi(x) = f(x)/(l 



x) xfi(x) + 1 

l 1 1 / xfx(x) + 1 —/J,(x) 



T F(x) (i-F(x))(xfi(x) + l-fi 2 {x))K -n(x) 1 
Hence 

h T ( x r -l w x_ 1 (1 - Xfl(x) + X 2 ) 

{X) F ^ [X) " (1 - F(x)) (xfi(x) + 1 - fi 2 (x)) ■ 

Using asymptotic expansion for the tail of the normal d.f. [see, e.g., Feller 
(1957), page 179], for fi(x) we obtain 

^ = T^W) where 5(x) = g ( - 1)t = 1 - | + - . 

From this one can derive that (1 — xfi(x) + x 2 )/(x/j,(x) + 1 — /U 2 (x)) ~ 2, x — > 
oo, and therefore /i T (x)r^^/i(x) ~ 2(1 — F(x)) -1 , a; — ► oo, or equivalently, 

7 T ( S )r s - 1 7(s)~2(l- S )' 1 , 
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Next, consider student t^-distribution with fixed number of degrees of 
freedom k. In this case, 



i r((fe + i)/2) 1 

f*k V{k/2) (l + (x2/fc))(*+l)/2' 



k + 1 x 

*' ix) = — i + (x*/ky xeR - 

Using asymptotics for k fixed and i->oowe obtain [cf., e.g., Soms (1976)] 



Consequently, 



r <4 / a; 

~ x k+2 { x (/e + l) 2 /(fc + 2) 

+2) xvk 

h T {x)T-\ x) h{x) ~ ^^* fc ~ ^±12 [1 - F(x)]- 1 , x - oo, 

or 7 T (s)r7 1 7 (s) - [2(jfe + 1)/Jfe](l - s)" 1 , as s -»• 1. 

The two values of A; = 1 and k = 2 deserve special attention because mean 
and variance do not exist in these two cases. For k = 1, one obtains standard 
Cauchy distribution and, as seen above, the transformation per ce remains 
technically sound and the proposed test to fit the standard Cauchy distribu- 
tion is valid as long as m{x) is interpreted as some other conditional location 
parameter of Y, given X = x, such as conditional median, and as long as one 
has an estimator of this m(x) satisfying (1.1). A similar comment applies 
when k = 2. 

Finally, let F be double exponential, or Laplace, d.f. with the density 
fix) = ae~ a \ x \,a > 0. For x > we get h{x) = (l,a) T and 7(5) = (l,a) T , 
and r s becomes degenerate, equal to (2.6). Therefore again, see (2.7) with 
vector b = j(t), for s > 1/2, 7 T '(s^^s) = (1 - s) -1 . 

Next, in this section we wish to clarify the question of a.s. continuity of K n 
and K as linear functionals and thus justify the presence of tail condition 
(2.10). For this purpose it is sufficient to consider particular case, when 
7(5) = 1 is one-dimensional and T s = 1 — s. In this case 

KM = - f 1 v(s)^ds, K{^) = - f 1 <f(s)^ ds. 
Jo 1 - s Jo 1 - s 
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The proposition below is of independent interest. 

Proposition 2.1. (i) K n (<p) is continuous linear functional in ip on 
L2[0, 1] for every finite n. 

(ii) However, the integral f 1 u 2 (s)/(l — s) 2 ds is almost surely infinite. 
Moreover, 

1 f s u 2 (t) 



, dt 1 cis s ^ 1 . 
ln(l-s)A) (l-t) 2 P 

Therefore, K(ip) is not continuous on L2[0, 1]. 

Remark 2.2. It is easy to see that E u 2 (s)/(l — s) 2 ds = 00, but this 
would not resolve the question of a.s. behavior of the integral and, hence, of 
K. 

Proof of Proposition 2.1. (i) From the Cauchy-Schwarz inequality 
we obtain 

and the question reduces to whether the integral Jq[u u (s)/(1 — s)] 2 ds is a.s. 
finite or not. However, it is, as even sup s \u n (s)/(l — s)\ is a proper random 
variable for any finite n. 

(ii) Recall that u(s)/(l — s) is a Brownian motion: if b denotes standard 
Brownian motion on [0,oo), then, in distribution, 

u{t) M vt€[o,i]. 



l-t \l-t 

Hence, in distribution, 

uHt) dt = I" b 2 (JL-) dt = [ T #® dz, r = s/(l - s). 







(l-t) 2 Jo \l-tj Jo (l + z) 2 
Integrating the last integral by parts yields 

„ w dz= _^ij 1+2 rm db{2) + r _j_ dz 



lo (l + z) 2 1+r Jo l + z Jo l + z 

<2 ' U> --*& + 2rgL«, )+H1+ . ) . 

1 + r Jo l + z 

Consider the martingale 

M(t)= f t ^^db(z), t>0. 
Jo l + z 
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( M )t= I m , L dz - 



Its quadratic variation process is 

k (1 + z) 

Note that (M) T equals the term on the left-hand side of (2.11). Divide (2.11) 
by ln(l +t) to obtain 

(M) T _ b 2 (r) M(r) 

ln(l + r) (l + r)ln(l + r) ln(l + r) 

The equalities 

EM\t) = E(M) t = f — ^ dz = ln(l + i) - _L ^6 2 (t) = t, 
Jo + 1+t 

imply that 

7- — — - = o„(l) and — - = Op(l) as r —* oo. 

(1 + r) ln(l + r) pv 7 ln(l + r) pv 7 

Hence, (M) r /ln(l + r) — > p 1, as r — ► oo. □ 

3. Power. Consider, for the sake of comparison, the problem of fitting 
a distribution in the one sample location model up to an unknown location 
parameter. More precisely, consider the problem of testing that X±, . . . ,X n 
is a random sample from F(- — 9), for some 9gK, against the class of 
all contiguous alternatives, that is, sequences of alternative distributions 
A n {- — 9) satisfying 

/dA n (x)Y/ 2 1 1 , . 

y g 2 (x)dF(x) < oo, y r2(x)dF(x) =oQ 

As is known, and as can intuitively be understood, one should be interested 
only in the class of functions g G L2{F) that are orthogonal to tpf. 



(3.1) / g(x)ip f (x)dF(x) = 0. 

Indeed, as ^ describes a functional "direction" in which the alternative A n 
deviates from F, if it has a component collinear with tpf, 

a(x) =g±(x) + cifj f (x), J g ± (x)^ f (x)dF(x) = 0, 

then infinitesimal changes in the direction ctpf will be explained by, or at- 
tributed to, the infinitesimal changes in the value of parameter, that is, 
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"within" parametric family. Hence it cannot (and should not) be detected 
by a test for our parametric hypothesis. So, we assume that g and ipf are 
orthogonal, that is, (3.1). 

Since 9 remains unspecified, we still need to estimate it. Suppose 9 is its 
MLE under F and consider empirical process v n based on e, = Xi — 9,i = 
1,2,. ..,n: 

1 " 

v n (x) = y/n[F n (x) - F(x)}, F n (x) = -]T) 

n i=x 

One uses the empirical process v n in the case one assumes 9 is known. 

It is known [see, e.g., Khmaladze (1979)] that the asymptotic shift of v n 
and v n under the sequence of alternatives A n with orthogonality condition 
(3.1) is the same and equals the function 

G(x)= f g(y)dF(y). 

J — oo 

However, the process v n has uniform asymptotic representation 

v n (x) =v n (x) + f(x) J if}f(y)dv n (y) + Op(l) 

and, the main part on the right is orthogonal projection of v n — see Khmal- 
adze (1979) for a precise statement; see also Tjurin (1970). Heuristically 
speaking, it implies that the process v n is "smaller" than v n . In particular, 
variance of v n {x) is bounded above by the variance of v n (x), for all x. There- 
fore, tests based on omnibus statistics, which typically measure an "overall" 
deviation of an empirical distribution function from F, or of empirical pro- 
cess from 0, will have better power if based on v n than v n . From a certain 
point of view this may seem a paradox, as it implies that, even if we know the 
parameter 9, it would still be better to replace it by an estimator, because 
the power of many goodness of fit tests will thus increase. However, note 
that the integral in the last display has the same asymptotic distribution 
under hypothetical F and alternatives A n , and therefore the v n is "bigger" 
than v n by the term, which is not useful in our testing problem. 

Transformation of the process v n asymptotically coincides with the pro- 
cess w n we study here, and moreover, the relationship between the two 
processes is one-to-one. Therefore, any statistic based on either one of these 
two processes will yield the same large sample inference. 

With the process v n the situation is the following: although it can be 
shown that the shift of this process under alternatives A n with orthogonality 
condition (3.1) is again function G, with general estimator m n and, therefore, 
the general form of R n , this process is not a transformation of v n only, and 
therefore is not its projection. In other words, it is not as "concentrated" as 
v n . The bias part of R n brings in additional randomization, not useful for 
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Fig. 1. Null empirical d.f. (red dashed curve) and null limit d.f. (black curve) ofW n . 

the testing problem at hand. As a result, one will have less power in tests 
based on omnibus statistics from v n . 

We illustrate this by a simulation study. In this study we chose the regres- 
sion model Y = m(X) + e, with m(x) = e x , and covariate X to be uniformly 
distributed on [0,2]. Let Fq(^/) denote d.f. of a standardized normal (stan- 
dardized double exponential) r.v. and fo(ip) denote their densities. The prob- 
lem is to test Hq : F = Fo, versus the alternatives Hi :F^Fq. In simulation 
below we chose a particular member of this alternative: F\ = 0.8-Fo + 0.2^. 
To estimate m, we used naive Nadaraya- Watson estimator 

n n 

™n(x) = Yil{ Xi e[x-a,x+a]} I hXie[x-a,x+a]}, 
i=l i=l 

with a = 0.04. We shall compare the two tests based on V n = sup x \v n (x)\ 
and W n = sup x |w n (x)|. In all simulations, n = 200, repeated 10,000 times. 

First, we generated null empirical d.f.'s of both statistics under the above 
set up. As seen in Figure 1, although the sample size n = 200 is not too big, 
the empirical null d.f. of W n is quite close to the d.f. of sup x |6(Fo(x))|, its 
limiting distribution. Empirical null d.f. of V n is given in Figure 3. 

To compare power of these tests, we generated 160 errors from Fo and 40 
from \f and used the above set up to compute V n and W n . Figure 2 shows 
the hypothetical normal density /o versus the alternative mixture density 
fi = 0.8/o + 0.2ip. Figure 3 describes empirical d.f.'s of V n under Fq and F\ 
while Figure 4 gives the same entities for W n . 

Clearly, the alternative we consider, given that the sample size is only 
n = 200, should indeed be not easy to detect, especially by a test. Besides, 
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as the difference between Fq and F\ occurs in the "middle" of the d.f. Fq, 
the alternative F\ is of a nature, favorable for application of Komogorov- 
Smirnov test based on v n . However, Figures 3 and 4 show the effect we 
expected: distribution of V n reacts to the alternative, that is, to the presence 
of double-exponential errors less than the distribution of W n . 

The above figures were computed with the window width a = 0.04. To 
assess the effect of window width on empirical power of these tests, we 
computed empirical power for additional values of a = 0.08,0.12, at some 
empirical levels a. Table 1 presents these numerical power values. In all cases 




-4 -2 2 4 

X 



Fig. 2. fo (dark curve) and 0.8/o + 0.2tp (red dashed curve). 
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Fig. 3. Empirical d.f.'s ofV n under Hq (black curve) and Hi (red dashed curve). 
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Table 1 








Empirical power of V n 


and W n tests 




/ and a 


a 




w n 


fi, a = 0.04 


0.10 


0.1904 


0.3168 




0.05 


0.1154 


0.1920 




0.025 


0.0625 


0.1114 




0.01 


0.0260 


0.0523 


fi, a = 0.08 


0.10 


0.1838 


0.2115 




0.05 


0.1081 


0.1242 




0.025 


0.0680 


0.0744 




0.01 


0.0325 


0.0450 


/i, a = 0.12 


0.10 


0.1837 


0.1960 




0.05 


0.1085 


0.1150 




0.025 


0.0619 


0.0760 




0.01 


0.0301 


0.0480 



one sees the empirical power of W n test to be larger than that of V n test at all 
chosen levels a, although for a = 0.04, this difference is far more significant 
than in the other two cases. Critical values used in this comparison were 
estimated from their respective empirical null distributions. These are not 
isolated findings — more examples can be found in Brownrigg (2008). 

Returning to general discussion on power, we must add that with the 
estimator rh n used by Midler, Schick and Wefelmeyer, and therefore, with 
their simple form of R n , the process v n is again asymptotically a projection, 
although in general a skew one, of the process v n . As described in Khmaladze 
(1979), it is asymptotically in one-to-one relationship with the process v n , 




1 Z 3 4 

X 

Fig. 4. Empirical d.f.'s ofW n under Hq (black curve) and H 1 (red dashed curve). 
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and, therefore w n . Hence, the large sample inference drawn from a statistic 
based on v n is, in this case, also equivalent to that drawn from the analogous 
statistic based on either of the other two, and the only difference between 
this processes is that v n and v n are not asymptotically distribution free, 
while w n is. 

4. Weak convergence of w n . In this section we prove weak convergence 
for the process w n , given by (2.2) and (2.3). In view of (2.5), (2.9) and the 
fact that the weak convergence of the first part in the right-hand side of 
(2.5) was proved in Theorem 2.1, it suffices to show that the process rj n of 
(2.5) is asymptotically small. Being the transformation of "small" process 
£ n , the smallness of r/ n is plausible. However, the transformation K(-,£ n ) is 
not continuous in £ n in uniform metric. Indeed, although in the integration 
by parts formula 

\(s)dUF~Hs))=UF~\s)) 1 (s)\l =t - J' UF~\s))d 7 (s), 

we can show, that ^(F -1 = 0, the integral on the right-hand side is 

not necessarily small if "f(t) is not bounded at t = 1, as happens to be true for 
normal d.f. F where the second coordinate of ^y(t) is F~ 1 (t). Therefore, one 
cannot prove the smallness of r\ n in sufficient generality, using only uniform 
smallness of £ n . 

If we use, however, quite mild additional assumption on the right tail of 
£ n , or rather of v n and /, we can obtain the weak convergence of w n basically 
under the same conditions as in Theorem 2.2. Namely, assume that for some 
positive [3 < 1/2, 

(4.1) sup Vn Jt = o p (l) asi^oo, 

y>x (l-F(y))P 

uniformly in n. Note that the same condition for v n is satisfied for all (3 < 1 /2. 
Denote tail expected value and variance of 4>f(ei) by 

E[ipf\x] = E[ipf (ei)|ei > x], Vai[ipf\x] = Var[^ / (ei)|ei > x]. 

Now we formulate two more conditions on F. 

(a) For any e > the function tpf(F~ 1 ) is of bounded variation on [e, 1 — e] 
and for some e > it is monotone on [1 — e, 1] . 

(b) For some 8 > 0, e > and some C < oo, 

m^miM? < c{1 _ F{x)) -2s Vx : F{x) >1 _ £ _ 



Yar[ipf\x] 

Note that in terms of the above notation 



(4.2) 7 (*) T rrS(*) 



ty f {x)-E[1> f \x]f 



l-F{x)[ Var[V>/|x] 



t = F(x). 
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Vt > 1 - e. 



Hence, condition (b) is equivalent to 

(4.3) -y(tfT; 1 'y(t)<C(l-t)- 1 -* 5 

Condition (b) is easily satisfied in all examples of Section 2, even with 5 = 0. 
Our last condition is as follows, 
(c) For some C < oo and > as in (4.1), 



[1-F(y)fdil>f(y) 



<Cty f {x)-Ety f \x]\. 



Condition (c) is also easily satisfied in all examples of Section 2, even for 
arbitrarily small 0. 

For example, for logistic distribution, with t = F(x), ipf(x) = 2t — 1 and 



[l-F(y)fdi/> f (y) 



[1-sfds 



+ 1 



(1-t) 



while \ipf(x) — E[ipf\x 
normal distribution, 



(1 — t) and their ratio tends to 0, as t — > 1. For 

/ [1-F(y)fd*l, f (y)~ —ff>( y )dy<- y 1 ^f(y)dy, 
Jx Jx y r x jx 



while 



W f {x)-E[rl> f \x}\ 



Six) 



x — ► oo, 



(/3 (t) 7 (t) T r t - 1 / 7 ( S )^(F- 1 (d S ))dt 



1 - Fix) 

and the ratio again tends to 0, as x — > oo. 
Recall the notation 

K{<p,£ n )-- 

and for a given indexing class of functions from ^[0, 1] let o F = 
{<p(F(.)),<p€$}. 

Theorem 4.1. (i) Suppose conditions (4-1) and (a)-(c) are satisfied 
with > 5. Then, on the class $ £ as in Theorem 2.1 but with a < — 5, we 
have 



sup \K(<p,£ n )\ =o p (l), 



n — s- oo. 



Therefore, if & is a Donsker class, then, for every e > 0, 
w n ^ d b in ioo^n^oF), 

where {b{(p),ip G <&} is standard Brownian motion. 

(ii) //, in addition, 5 <a, then for the time transformed process w„(F _1 (-)) 
of (2.2), we have 

w n {F-\-))^ d b{-) inD[0,l}. 
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Proof. Note, that 



1 (7j; f (x) - E[^ f \x])a 
l-F{x) Var[^/|s] 



t = F(x), VaeR. 



Use this equality for a = //(l — dtpf(F 1 (s)). Then condition (c) implies 
that 



(4.4) 



h(t) T T^(0, a) T \ < C 7 (t) T r t - 1 7 (t) Vt < 1. 



Now we prove the first claim. 

(i) Use the notation £' n (t) = £ n (x) with t = F(x). Since we expect singu- 
larities at t = and, especially, at t = 1 in both integrals in K(ip,^ n ) we will 
isolate the neighborhood of these points and consider it separately. Mostly 
we will take care of the neighborhood of t = 1. The neighborhood of t = 
can be treated more easily (see below). First assume iy 1 nondegenerate for 
all i< 1. Then, 



Consider the third summand on the right-hand side. First note that, when 
proving that it is small, we can replace £ n by the difference v n — v n only. 
Indeed, since d/(F _1 (s)) = ipf(x)f (x) dx, according to (2.4) the integral 



is the second coordinate of J 1 _ £ ip(t)j(t) dt, and is small for e small anyway. 
Monotonicity of iJjf(F~ 1 ) guaranteed by assumption (a) and (2.1) justify 
integration by parts of the inner integral in the following derivation. 




(4.5) 
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1-e 



<p(t)>y(tyTt L >y(t)u n (t)dt 



<C 



<c 



1 

1-e 
1 



b(t) T rr 1 7(0] 1/2 



I dt sup mi 

(1 _ t )l/2+a-/J <* ™P (1 - t f 



dt sup 



l"n(*)| 



l_ e (l-i)l+«+^ tM^(l-^' 



which is small for small e as soon as a < /? — 5. 

Now, note that J t u n (s) d-y(s) = (0,Jlu n (s)di/jf(F~ 1 (s))) T . Using mono- 
tonicity of ipj(F~~ 1 ) for small enough e we obtain, for all t > 1 — e, 



(4.6) 



"(l-^d^CF- 1 ^)) 



|«n(s)| 

^ p £ (T^F- 



Therefore, using (4.4), for the double integral we obtain 



vitWtfTt 1 I u n {s)d 1 {s)dt 



1-e 



<c 



1 bWWtfrrVW* sup 

1-e s>l-e(l-S) p 



and the integral on the right-hand side, as we have seen above, is small as 
soon as a < (3 — 5. The same conclusion is true for u n replaced by u n . 
Since (4.6) implies the smallness of 
<-i 



1-e 



u n (s)diJ>f(F-\s)) and / u n (s) dipf(F~ 1 (s)), 



1-e 



to prove that the middle summand on the right-hand side of (4.5) is small 
one needs only finiteness of if>f(x) in each x with < Fix) < 1, which follows 
from (a). This and uniform in x smallness of £ n proves smallness of the first 
summand as well. 

The smallness of integrals 

¥J (t) 7 (t) T r t - 1 7(t) J\{s)t' n {ds)dt 

follows from ~ T^ 1 and square integrability of (p and 7. 
If iy 1 becomes degenerate after some to, for these t we get 



t 

and the smallness of all tail integrals easily follows for our choice of the 
indexing functions ip. 
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(ii) Since for 5 < a the envelope function ^(t) of (2.10) satisfies inequality 

> (1 - t) s ~ a , 

it has positive finite or infinite lower limit at t = 1. But then it is possible 
to choose as an indexing class the class of indicator functions (p(t) = I{t< r } 
and the claim follows. □ 

Remark 4.1 (Computational formula). We present here a computa- 
tional formula for w n . Let G(x) = j y < x ^p l ^h{y) dF (y) . Then, using (2.3) 
and (2.4) one obtains 

n 

w n {x) = n~ 1/2 ^[I(ei < x) - h(ei) T g(x A e*)], x <ER. 

i=l 

Thus to implement test based on sup x |w n (x)|, one needs to evaluate Q and 
compute max!<j< n \w n {eu\\, where e^, 1 < j < n, are the order statistics 
of ej, 1 < j < n. 

Remark 4.2 (Testing with an unknown scale). Here, we shall describe 
an analog of the above transformation suitable for testing the hypothesis 
H sc that the common d.f. of the errors is F(x/a), for all x € M, and for 
some a > 0. Let (/)f(x) = 1 + xtpf(x) and h a (x) = (1, a~ l ipf(x),a~ 1 4>f(x)) T . 
Then analog of the vector h{x) here is h a (x/a) and that of Tj is 

rV=/ h a (y)h a (yf dF(y), t = F(-). 

This is the same matrix as given in Khmaladze and Koul (2004), page 1013. 
Akin to the function K(x,i 1 ) define 

/x/(T roo 
h T a {y)T~\ )a h a (z)du(za)dF(y), xeR. 
-oo yy " Jy/a 

Analog of Lemma 2.1 continues to hold for each a > 0, and hence this func- 
tion is well defined for all x G R, a > 0. 

Let it be a ra 1//2 -consistent estimator of a based on {(Xi,Yi), 1 < i < n}. 
Let F n {x) be the empirical d.f. of the residuals Hi = ii/a and let v n = 
n l l 2 [F n — F]. Then the analog of w n suitable for testing H sc is 

w n (x) =n 1/2 [F n (x) - K a (x,F n )} =v n (x) - K a (x,v n ). 

Under conditions analogous to those given in Section 4 above, one can verify 
that the conclusions of Theorem 4.1 continue to hold for w n also. 
If we let Q a (x) = $ y < x / a T F\ y ) )tJ h a (y) dF(y), then, one can rewrite 

n 

w n (x) = n - l/2 Y^[I(ei <x) - h a (Ii) T g a (xAei)], x £ R. 
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Hence, sup x |u; n (x)| = maxi <.,<„, \ w n {e^\, where e^), 1 < j < n, are the or- 
der statistics of ej, 1 < j < n. 
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